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Abstract 

A central problem in releasing aggregate information about sensitive data is to do so accurately while 
providing a privacy guarantee on the output. Recent work focuses on the class of linear queries, which 
include basic counting queries, data cubes, and contingency tables. The goal is to maximize the utility 
of their output, while giving a rigorous privacy guarantee. Most results follow a common template: pick 
a "strategy" set of linear queries to apply to the data, then use the noisy answers to these queries to 
reconstruct the queries of interest. This entails either picking a strategy set that is hoped to be good for 
the queries, or performing a costly search over the space of all possible strategies. 

In this paper, we propose a new approach that balances accuracy and efficiency: we show how 
to improve the accuracy of a given query set by answering some strategy queries more accurately than 
others. This leads to an efficient optimal noise allocation for many popular strategies, including wavelets, 
hierarchies, Fourier coefficients and more. For the important case of marginal queries we show that this 
strictly improves on previous methods, both analytically and empirically. Our results also extend to 
ensuring that the returned query answers are consistent with an (unknown) data set at minimal extra cost 
in terms of time and noise. 

1 Introduction 

The long-term goal of much work in data privacy is to enable the release of information that accurately 
captures the behavior of an input data set, while preserving the privacy of individuals described therein. 
There are two central, interlinked questions to address around this goal: what privacy properties should the 
transformation process possess, and how can we ensure that the output is useful for subsequent analysis 
and processing? The model of Differential Privacy has lately gained broad acceptance as a criterion for 
private data release 0|9l. There are now multiple different methods which achieve Differential Privacy 
over different data types ||T]|2j|4j0[TTl[T2j[l4l[T3|T3|23l. Some provide a strong utility guarantee, while 
others demonstrate their utility via empirical studies. These algorithms also vary from the highly practical, 
to taking time exponential in the data size. 

The output of the data release should be compatible with existing tools and processes in order to provide 
usable results. The model of contingency tables is universal, in that any relation can be represented exactly 
in this form. That is, the contingency table of a dataset over a subset of attributes contains, for each possible 
attribute combination, the number of tuples that occur in the data with that set of attribute values. In this 
paper, we call such a contingency table the marginal of the database over the respective subset of attributes. 
The set of all possible marginals for a relation is captured by the data cube. Contingency tables and the data 
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(c) Strategy matrix S (d) Recovery matrix R 

Figure 1: Example contingency table, query matrix, with strategy and recovery matrices. 



cube in turn are examples of a more general class of linear queries, i.e., each query is a linear combination 
of the entries of the contingency table over all attributes in the input relation. 

There has been much interest in providing methods to answer such linear queries with privacy guaran- 
tees. In this paper, we argue that these all fit within a general framework: answer some set of queries S over 
the data (not necessarily the set that was requested), with appropriate noise added to provide the privacy, 
then use the answers to answer the given queries A limitation of prior work is that it applies uniform noise to 
the answers to S: the same magnitude of noise is added to each query. However, it turns out that the accuracy 
can be much improved by using non-uniform noise: using different noise for each answer, while providing 
the same overall guarantee. The main contribution of this paper is to provide a full formal understanding of 
this problem and the role that non-uniform noise can play. 

Example. Figure 1(a) shows a table with 3 binary attributes A, B and C. As in prior work lfl6ll . we think of 
a database V as an iV-dimensional vector x € 1^, where TV is the domain size of V; i.e., if V has attributes 
Ai, . . . , Ad, then ./V = n^ =1 |ylj|. We linearize the domain of V, so that each index position i, 1 < i < N, 
corresponds to a unique combination a of attribute values, and Xi is the number of tuples in V that have 
values a. In Figure 1(a) we linearized the domain in the order 000,001, . . . , 111. Here, position i = 2 
corresponds to the combination of values a = 001. Thus, X2 = 2 since V contains two tuples (1 and 4) with 
these values. 

Suppose that we want to compute two marginals over V: the marginal over A, and the marginal over 
A, B. The query marginals can be represented as a matrix Q, as depicted in Figure 1(b) so that the answer 
is Qx: The first two rows compute the marginal over A; i.e., the first row is the linear query that counts all 
tuples t with t.A = 0; while the second row counts all tuples with t.A = 1. Similarly, the third row counts 
all tuples with t.A = and t.B = 0, and so on. Differentially private mechanisms answer Q in the form of 
y = Qx + r, where r is a random vector whose distribution provides a certain level of privacy. The error of 
the answer is generally defined as the variance Var(y) Ifl6ll23"l . 

For example, one way to provide e-differential privacy adds uniform noise to each answer. Based on the 

we can add noise with variance ^ to each answer; see details in Section |2j). 



structure of Q in Figure 1(b) 



Over the six queries, the sum of variances is |§. However, we can do better with a non-uniform approach. 



2 



For example, we can add noise with variance 2(|^) 2 to the answers for the first two rows of Q, and noise 
with variance 2(^) 2 to the remaining four answers, and still provide e-differential privacy. The sum of the 
six variances is then 2-2(^) 2 + 4- 2(J^) 2 = 46.17/e 2 . We can improve this even further by changing how 
we answer the queries: we can answer the first query Q\ by taking half of the first answer, and adding half 
of the third and fourth answers. The resulting variance of Q\ is 

|-2(^) 2 + |-2(^) 2 + |-2(^) 2 = 5.77M 

Similar tricks yield the same variance for all other answers, so the sum of all six variances is now 34.6/e 2 , 
a 28% reduction over the uniform approach. □ 
This example shows that we can significantly improve the accuracy of our answers while preserving the 
same level of privacy by adopting non-uniform noise and careful combination of intermediate answers to 
give the final answer. Yet further improvement can result by choosing a different set of queries to obtain 
noisy answers to. The problem we address in this paper is how to use these techniques to efficiently and 
accurately provide answers to such queries Q that meet the differential privacy guarantee. This captures the 
core problems of releasing data cubes, contingency tables and marginals. Our results are more general, as 
they apply to arbitrary sets of linear queries Q, but our focus is on these important special cases. We also 
discuss how to additionally ensure that the answers meet certain consistency criteria. Next, we study how 
existing techniques can be applied to this problem, and discuss their limitations. 

The Strategy/Recovery approach. Mechanisms for minimizing the error of linear counting queries under 
differential privacy have attracted a lot of attention. Work in the theory community (21 [lOl HU [121 [13] 1221 
has focused on providing the best bounds on noise for an arbitrary set of such queries, in both the online and 
offline setting. However, these mechanisms are rarely practical for large databases with moderately high 
dimensionality: they can scale exponentially with the size of their input. 

Work in database research has aimed to deliver methods that scale to realistic data sizes. Much of this 
work builds on basic primitives in differential privacy such as adding appropriately scaled noise to a numeric 
quantity from a specific random distribution (see Section [2]). Repeating this process for multiple different 
quantities, and reasoning about how the privacy guarantees compose, it is possible to ensure that the full 
output meets the privacy definition. The goal is then to minimize the error introduced into the query answers 
(as measured by their variance) while satisfying the privacy conditions. 

Given this outline, we observe that the bulk of methods using noise addition fit into a two-step framework 
that we dub the 'strategy/recovery' approach: 

• Step 1. Find a strategy matrix S and compute the vector z = Sx + v, where v is a random noise 
vector drawn from an appropriate distribution. Then z is the differentially private answer to the queries 
represented by S. 

• Step 2. Compute a recovery matrix R, such that Q = RS. Return y = Rz as the differentially private 
answer to the queries Q. The variance Var(y) is often used as an error measure for the approach. 



We show this method schematically in Figure [2] For example, Figures 1(c) and 1(d) show a possible 



choice of matrices S and R for the query matrix Q in Figure 1(b) In this case, the strategy S computes the 
marginal on A, B; Step 1 above adds random noise independently to all cells in this marginal. The recovery 
R computes the marginal on A by aggregating the corresponding noisy cells from the marginal on A, B (the 
first two rows of R), and also outputs the marginal on A, B (the last four rows of R). 

We now show how prior work fits into this approach. In many cases, the first step directly picks a 
fixed matrix for S, by arguing that this is suitable for a particular class of queries Q. For example, when 
setting S = I (hence R = Q), the approach computes a set of noisy counts xi by adding Laplace noise 
independently to each Xj. The answer to any query matrix Q is computed over these noisy counts, i.e., 
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Figure 2: Framework of prior work. 



y = Qx; this model was analyzed in [U. By contrast, when S = Q (and R = I), as discussed in [7], the 
approach adds noise to the result of each query in Q, i.e., y = Qx + v. 

Several more sophisticated strategies have been designed, with the goal of minimizing the error Var(y) 
for various query workloads. When Q consists of low-dimensional range queries, 1123] proposes S to be 
the wavelet transform, while CPA studies the strategy S corresponding to a hierarchical structure over x. 
However, as shown in [16], neither of these strategies is particularly accurate for other types of queries. 
For marginals, [1] chooses S to be the Fourier transform matrix, and [6] employs a clustering algorithm 
over the queries to compute S. Figures 1(c) and |l(d)| depict the output computed via [6] on query matrix Q 
(Figure 1(b) I. Other work has suggested the use of random projections as the strategy matrix, connecting 
to the area of sparse recovery J5j[l8l. Many of these choices are relatively fast: that is, S and can be 
applied to a vector of length N in time O(N) or 0(N log N) in the case of wavelet and Fourier transforms, 
respectively. This is important, since real data can have large values of N, and so asymptotically higher 
running time may not be practical. A limitation of (6J is that the clustering step is very expensive, limiting 
the scalability of the approach. 

An important technical distinction for the strategy/recovery approach is whether or not the strategy S is 
invertible. If it is (e.g., when S is the Fourier or wavelet transform), then the recovery matrix R = QS^ 1 is 
unique, and the query answer y is guaranteed to be consistent (see Definition 2.3 1. Then the error measure 
Var(y) depends only on S (and Q). However, if S is not invertible, then there can be many choices for R, 
and Var(y) depends on both S and R. The optimal recovery R that minimizes Var(y) (for a fixed S) can 
be computed via the least squares method Ifl4l[l61 and Var(y) has a closed-form expression as a function of 
S. Using this fact, Li et al. [ 16 ] study the following optimization problem: Given queries Q and a formula 
for Var(y) as a function of S, compute the strategy S that minimizes Var(y). This is a tough optimization, 
since the search is over all possible strategy matrices S. Their matrix mechanism uses a rank-constrained 
semidefinite program (SDP) to compute the optimal S. Solving this SDP is very costly as a function of N, 
making it impractical for data with more than a few tens of entries. 

In summary, the search for a strategy matrix S is currently done either by picking one that we think 
is likely to be "good" for queries Q, or by solving an SDP, which is impractical even for moderate size 
problems. 

Our Contributions. Most of the prior approaches discussed above use the uniform "noise budgeting" strat- 
egy, i.e., each value V{ of the noise vector is (independently) drawn from the same random distribution. The 
scaling parameter of this distribution depends on the desired privacy guarantee e, as well as the "sensitivity" 
of the strategy matrix S (see Section [2]). 

In the extended version of |fT6l . the authors prove that any non-uniform noise budgeting strategy can be 
reduced to a uniform budgeting strategy by scaling the rows of S with different factors. However, computing 
the optimal scaling factors this way is impractical, as it requires solving an SDP. The only efficient method 
for computing non-uniform noise budgets we are aware of applies to the special case when Q is a range 
query workload H. There, S corresponds to a multi-dimensional hierarchical decomposition, and recovery 
R corresponds to the greedy range decomposition. The resulting budgeting is not always optimal. 

In this paper we show how to compute the optimal noise budgets in time at most linear in the sizes of R 
and S, for a large class of queries Q (including marginal queries), and for most of the matrices S considered 
in prior work. This includes the Fourier transform, the wavelet transform, the hierarchical structure over x, 
and any strategy consisting of a set of marginals (in particular, the clustering strategy of [6]). 
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Figure 3: Our proposed framework. 

The overall framework introduced is depicted in Figure [3] Given strategy matrix S and recovery matrix 
R, we compute optimal noise budgets Si for each query, and draw each noise value v\ from a random 
distribution that depends on Si (Step 2). We then derive a new recovery matrix R that minimizes Var(y) 
(Step 3), for the noise budgets computed in Step 2. 

The most general approach would be to provide a mathematical formulation for the following global 
optimization problem: Given the query matrix Q, compute the strategy S, the recovery R, and the noise 
budgets Si that minimize Var(y). However, this problem essentially reduces to that addressed by the matrix 
mechanism [16], and requires solving an SDR 

Instead, we study how to efficiently solve optimization problems where two out of the three parameters 



S, R and {£i]i are fixed. In Section 3.1 we solve the optimization problem: Given a decomposition of 



query matrix Q into strategy S and recovery R, compute the optimal noise budgets £j that minimize Var(y). 



We provide a formula for Var(y), as a function of S and R. In Section 3.2 we apply the generalized least 
squares method to solve the following problem: Given the query matrix Q, the strategy S, and the noise 
budgets Ei, compute the recovery R that minimizes Var(y). Following the steps in this framework provides 
efficient algorithms with low error. A faster alternative computes a consistent output y of Step 3 with small 



(but non-optimal) error; see Sections 3.3 and 4.3 Our approach strictly improves over the previous result 
from 

In the common case that S is invertible, our framework decreases the error for the Fourier and wavelet 
approaches from prior work. Computing the optimal noise budgets here is very fast, so this improvement 
comes with only a small time overhead: less than 1 second in our experiments. 

To summarize, our contributions are as follows: 

• We propose a framework for minimizing the error of differentially private answers. It improves on the 
accuracy of existing strategies, at minimal computation cost. 

• We develop fast algorithms within this framework for marginal queries. Our algorithms compute 
consistent answers. In particular, when Q is the set of all A;-way marginals, we give asymptotic bounds 
on the eiTor of our mechanism; we are not aware of any such analysis for the matrix mechanism. As 
a by-product, our analysis also improves the error bound for the uniform noise case. 

• We conduct an extensive experimental study on marginal query workloads and show that our frame- 
work reduces the error of existing strategies (including the Fourier strategy [1J and the Cluster strat- 
egy 0). 

Organization. Section [2] introduces the necessary definitions for describing our framework. The optimiza- 
tion results required by Steps 2 and 3 are developed in Section [3] In Section [4j we describe novel results 
that allow us to apply our framework to marginal queries in an efficient manner, and to compute consistent 
results. Our experimental study is presented in Section|5J and we conclude in Section[6j 

2 Definitions 

We begin by recalling the definition of differential privacy and some fundamental mechanisms which satisfy 
this definition. 
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Definition 2.1 (Differential privacy 10 HJ). A randomized algorithm A satisfies (e, 5) -differential privacy if 
for all databases D\ and D2 differing in at most one element, and all measurable subsets S C Range(^l), 

Pt[A(Di) eS]<e e - Pr[A(D 2 ) eS] + 5. 

We say that an algorithm satisfies e-differential privacy if it satisfies (e, 0)-differential privacy. 

Definition 2.2 (L p -sensitivity). For p > 1 let the L p -sensitivity A p (/) of a function / : D — > M. q be defined 
as: 

A p (f)=max\\f(D 1 )-f(D 2 )\\ p , 

for all D\ and D2 differing in at most one element. Here, || • || p denotes the standard L p norm, i.e., = 
(E?=i \xi\ p ) 1/p ioxx G W. 

We rely on the following two basic mechanisms to construct differentially private algorithms: 
Theorem 2.1 (Laplace mechanism [9]). If / is a function /:£)—>• R 9 , then releasing / with additive q- 
dimensional Laplace noise with variance 2 ^ - 1 in each component satisfies e-differential privacy. 

Theorem 2.2 (Gaussian mechanism HHHl). If / is a function f:D—> M. q , then releasing / with additive q- 
dimensional Gaussian noise with variance ^2A|(/) log [a^ ^ in each component satisfies (e, 5) -differential 
privacy. 

Query workloads, consistency, strategy and recovery. As mentioned in Section [T] we represent the 
database as a vector x G M N and the query workload as a matrix Q G M' ?xAr : each row Qj., 1 < i < q, is a 
linear query over database x. It is easy to see that the sensitivity of Q is A P (Q) = max.j =1 ||Q.j||p, where 
Q.j denotes the jth column of One differentially private answer to Q is a vector y = Qx + r, where 
r 6 E 9 is the noise vector drawn from an appropriate (Laplace or Gaussian) distribution. Our formal goal 
is to minimize the variance of a given linear functional a T ■ Var(y) for some fixed vector a G IR+, while 
guaranteeing differential privacy. For example, if a = 1 we minimize the sum of the variances of noise 
over all queries. In particular, we study workloads Q that consist of marginals over x, such as the set of all 
fc-way marginals, for some small integer k. 

Definition 2.3. A noisy output y = Qx + r is consistent if there exists at least one vector x c such that 

y = Qx c . 

We decompose a query workload Q into a strategy matrix S G M. mxN , and a recovery matrix R G M. qxm , 
such that Q = RS. The query answer y is then computed as y = Rz, where z = Sx + v is the noisy answer 
to S (hence, r = Rv). In general, there are many possible ways to pick R and S given Q, and our goal will 
be to minimize the resulting Var(y). 

3 Our Framework 

In this section we solve the optimization problems required by Steps 2 and 3 of our framework from Figure|3] 
3.1 Optimal Noise Budgeting (Step 2) 

A novel part of our scheme is a special purpose budgeting mechanism: For each row Si. in the strategy S, 
we release Zi = Si.x + Vi, where vi is drawn from a Laplace distribution that depends on a value £{. We 
show how to choose the values ei optimally so that the overall method satisfies e-differential privacy and the 

'We assume that each individual contributes a weight of 1 to some entry of x, in line with prior work. Other cases can be 
handled by rescaling the sensitivity accordingly. 
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resulting noise is minimized. We also design an approach based on grouping rows of the strategy matrix S, 
which allows us to compute the optimal e% 's efficiently. 

Proposition 3.1. Let S be an m x N strategy matrix, and let ei, . . . , e m be a set of m non-negative values. 
Define the noisy answer to S to be an m-dimensional vector z such that Z\ = Si.x + Vi, 1 < i < m. 

(i) If Ui is drawn from the Laplace distribution with variance \ , then z satisfies a-differential privacy, 

where a = 2maxjl 1 (^™ 1 \ Sij\ei). 

(ii) If i/j is drawn from the Gaussian distribution with variance 2 log ^/^ ( and a = 2 max^ \JYaLi ^1j £ \ 
z satisfies (a, <5) -differential privacy. 

Proof. We only show (i), the proof for (ii) is shown in|X] We decompose S as D~ 1 DS where D is the 
diagonal matrix D = diag(ei, . . . e m ). We now consider the L p sensitivity of the function f(x) = {DS)x. 
From Definition |272l we have 



/V /v * — -v 

A p (/) < 2 max || {DS).A\ P = 2max(V|5 ii e 

2=1 

Thus, adding noise with variance proportional to (2 Ap ^- > ) 2 provides o-differential privacy (via Theorem |2. 1 
with p = 1) or (a, 5) -differential privacy (via Theorem 2.2 with p = 2). Finally, multiplying by D^ 1 has 



the effect of rescaling the variance in each component: the ith component now has variance proportional 
to ( a p(^) ) 2 . Setting a = A p (/) for p = 1 or p = 2 and applying the correct scaling constants gives the 
claimed result. □ 



The proofs for (ii) and other results are omitted for brevity. 

Recall that the output is computed as y = Rz. Our goal is to choose values g, that minimize the variance 
a T Var(y) = a T Var(i?z/). We detail this for Laplace mechanism: 

q m R 2 m q 

a T ■ Var(^) = 2^>^ ^ = 2 £ ^ £ 0i i$. 

1=1 J = l J 1=1 * jr' = l 



Let hi = 2 Yjj=\ a jKji- Proposition 3.1 it follows that the optimal noise budgeting {e^ is the solution 
to the following optimization problem: 

Minimize: YZi § (!) 
Subject to: l<%h < e, 1 < J < iV. (2) 

e» > 0, 1 < i < m (3) 

Because all 6j's are non-negative, the objective function is convex. The body defined by the linear inequal- 
ities is also convex. The resulting problem can thus be solved using a convex optimization package that 
implements, e.g., interior point methods. Such methods require time polynomial in m, N, and the required 
accuracy of the solution ll2"TTl . 

Efficient Solution via Grouping. Convex optimization solvers may require a large number of iterations 
and be too inefficient for databases of moderate dimensionality. However, for most of the frequently used 
strategy matrices, the optimization problem can be significantly simplified, if we partition the rows of the 
strategy matrix S into groups, and define the corresponding values Ej to be the same for all rows in a group. 
We show that the groups can be chosen in such way that all conditions YaLi \^ij\ £ i — £ become identical 
once we set the Ej's to be equal in each group, which leads to a closed form solution. This approach was 
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implicitly used in [4] - We show that this concept can be applied to a larger class of strategy matrices. The 
optimal solution for the simplified problem is a feasible solution for the general problem. If recovery matrix 
R satisfies a certain property (as is the case for all matrices we consider), then the optimal solution for 
the simplified problem is also guaranteed optimal for the general case. In particular, we find optimal noise 
budgets for strategy /recovery methods such as Fourier [ 1] and clustering [6]. 

Definition 3. 1. Let S be an m x N strategy matrix. We say that S satisfies the grouping property if there 
exists a grouping function over its rows G : [m] — > [g], g < m, such that the following two conditions are 
satisfied: 

— row-wise disjointness: for any two rows i%,i2 of S with G{i\) = G(i2) and for any column j, S^jS^j = 
0; 

— bounded column norm: for any group r, and for any two columns j\,j2, we have maXj. G (j) =r . \ Sij 1 \ = 

max i:G(i)=r \Sij 2 \ = C r . 

The minimum g for which S has a grouping function G is called the grouping number of S. 

Together, the two conditions in Definition [3J] imply that any column of S contains at most one non-zero 
value from each group, and that that value is the same (within a group) for all columns. Hence, not every 
S can meet this definition: while we could put every row in a singleton group, we also then require that the 
magnitude of all non-zero entries in the row are identical. Nevertheless, as we show below, many commonly 
used matrices are groupable. 

Example. Matrix S in Figure [TJc) has grouping number g = 1: each column has exactly one entry equal to 
1, so C\ = 1. On the other hand, if S = Q is the matrix in Figure[TJb), the grouping number is 2: we define 
one group containing the first two rows, and another containing the last four rows. We have C\ = C% = 1. 
Note that, e.g., the first and third rows cannot be grouped together, since Q11Q31 = 1^0. We now apply 
this definition to the other strategy matrices proposed: 

Base counts. As noted in the introduction, directly materializing the noisy version of x is equivalent to 
S = /. In this case, all rows form a single group; hence, g = 1 and C\ = 1. 

Collections of marginals. When S is a set of marginals, all rows that compute the cells in the same marginal 
can be grouped together, as in the above example. Hence, the number of groups g is the number of marginals 
computed; and C r = 1 for each group r. 

Hierarchical structures. When S represents a hierarchy over x, all rows that compute the counts at the 
same level in the hierarchy form one group. Hence, the grouping number g is the depth of the hierarchy 
and all C r values are 1. Specifically, when S represents a binary tree over x, the grouping number is g = 
|~log 2 N~\ . The same essentially holds for the one-dimensional Haar wavelet (here, g = |~log 2 N~\ + 1). For 
higher dimensional wavelets, the grouping number grows exponentially with the dimension of the wavelet 
transform. 



Fourier transform. The Fourier transform (discussed in more detail in Section 4.1 1 is dense: every entry is 
non-zero and has absolute value 2~ d l 2 . In this case, each row forms its own group, the grouping number is 
N, and C r = 2~ d l 2 for any group r. 

Sparse random projections. Sketches are sparse random projections that partition the data x into buckets, 
repeated t times (21. All entries in the sketch matrix S are {— 1, 0, +1}. In this case, all rows that define one 
particular partition of the data form one group, so g = t and C r = 1. 

Arbitrary strategies S. If S is groupable, we can greedily find a grouping as follows: start a group with an 
arbitrary row, and try to add each remaining row to existing groups; if a row cannot be added to an existing 
group, a new group is created for it. While this may not result in a minimum g, any grouping suffices for our 
purposes. We do not discuss the greedy approach further, since all the strategies we study can be grouped 
directly as discussed above. 
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Definition 3.2. Let S be an m x N strategy matrix with grouping function G. Let R be a corresponding 
q x m recovery matrix. We say that R is consistent with G if for any rows i\,i2 of S with G{i\) = G(i2), 
we have = bi 2 (where 6j = 2 E?=i a i^|j are as m objective function ([TJ). 

When Q is a set of marginals and a = 1 , it is easy to verify that R is consistent with the optimal 
grouping of 5, for all the choices of S considered in prior work: S = I, S = Q, S =Fourier transform, 
and S = strategy marginals computed by clustering (6l (here, R aggregates cells of the centroid marginal to 
compute each of the marginals assigned to a cluster). 

The next result follows directly from the properties of the grouping function. 

Lemma 3.2. Let S be a strategy matrix with grouping function G. There is a feasible solution to the opti- 
mization problem ([T]) - ([3]) such that for each group r and for all pairs of rows i\, %i with G(i\) = G{i2) = r, 
we have = £j 2 . Moreover, all privacy conditions Q are equivalent, and can be satisfied with equality. 
If R is consistent with G, then the above solution is optimal for the problem defined by ([I]) - Q. 

Proof. Let r] = rji, . . . , r\ g be the noise budgets corresponding to the g groups of S; i.e., all e values for 
the rows in group 1 are equal to r)\, etc. Because of the grouping property, each condition (|2]) becomes 
Ylf=i Ci^i ^ e ) where Ci is the value defined by the bounded column norm for the group i (recall Defini- 
tion [3j]). Since the objective function is a minimization, we can make this inequality an equality. Clearly, 
{f]i}i are a feasible solution for ([T]) - ([3]). 

If R is consistent with G, we can change any optimal solution of ([TJ - ([3]) into a solution in which all e 
values in a group are equal, without increasing the objective function. We omit a formal proof here. □ 

Thus, when S has grouping function G, we can write a simpler optimization problem for noise budget- 
ing: 

Minimize: £f=l (4) 

Subject to: J2f=i Ci^i = e. (5) 
r)i > 0, 1 < i < m (6) 

Since there is now just a single constraint on the ^s, we can solve this via a simple Lagrange multiplier 
method. The corresponding Lagrange function is: 

A(A, rj) = ( £ + A( £ Cfo, - e) . 

i=l h i=l 



1/3 

<5 1, E {^f Er:G(r)=i b r) ' = e and thus: 



Setting the partial derivatives ^- to zero, we obtain r/i = (^7- Er:(?(r)=i ^r) 1 ^ 3 - By the privacy constraint 

9, / 2C 2 N 1/3 ' 



9 



l/3\3 



1=1 r:G(r)=i 

Corollary 3.3. In the case when all values Cj are equal to the same value C the optimum value of the 
objective function is equal to ^ fEf=i > where Sj = E r - G(r)=i br- For (e, 5) -differential privacy 

the corresponding value of the objective function is equal to 2C '"f^ 2 /^ (Ef=i V^i) 2 - 
Lemma [3T2l implies the following. 
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Theorem 3.4. Let S be a strategy matrix with grouping function G, and R be a corresponding recovery 
matrix consistent with G. Then the solution to the optimization problem ((4]) - ([6]> is the optimal noise 
budgeting for S and R. 

As discussed above, when Q is a set of marginals, all the strategy/recovery matrices proposed in prior 
work fit the conditions of Theorem |3.4| and thus their accuracy can be improved via optimal noise budgeting. 
Observe that the optimization problem Q - ([6]) can be solved reasonably fast: given R, S and a grouping 
of S, we can derive the vector of bi values in time linear in the size of R, i.e., in 0(qm). For particular S 
(e.g. the Fourier matrix), the cost can be even lower, due to the symmetric structure of S and R. Finally, all 
£i values, as well as Var(y), can be computed in 0(m) time. 



3.2 Optimal Recovery Matrix (Step 3) 

Given S G W nxN and the noise parameters £j, we wish to compute a matrix R £ ~R. qxm such that Q = RS 
and Var(y) is minimized. Rec all th at y = Rz = R{Sx + v), where z/j is drawn from the Laplace distribution 
of variance as in Section El] (the case of Gaussian distribution is similar). As we show below, the 
resulting y will also be consistent. 

We derive R via least squares statistical estimators. More precisely, given z = Sx + v, we first compute 
an estimate x of x which is linear in z and has minimum variance. The vector x is called the optimal 
(generalized) least squares solution. As we show below, x = Gz for some matrix G. We then define 
R = QG and y = Rz = Qx. A similar approach was used in [16] for the case of uniform noise. We extend 
the computation to the case of non-uniform noise budgets £j. 

Let £ be the covariance matrix of z: £ = Cov(z) = diag(-^). Define U = S -1 / 2 ^; hence, rank(J7) = 

rank(S). For simplicity, we assume that rank(5) = N. The same ideas as in |[T6*1 Section 3.3] can be used 
to handle the case rank(5) < N; see also GUI for further details. Then the LS solution is computed as 

x = (jj t xj)- x xj t y>- x i 2 z. 

Since S is diagonal, E = S T . We obtain 

{u T uy l u T = (s t s- 1 /2 S -i/2 >s) -i 5 t s -i/ 2 

= (5 T S- 1 5)- 1 5 T S- 1 / 2 . 
Thus, x = (S T ^- 1 S)- 1 S T ^- 1 z. 



Let G = (5 T E- 1 5)- 1 5 T S- 1 . We define R = QG, i.e. 

R = Q(S T X- 1 S)- l S T X- 1 . 



(V) 



Note that y = Rz = Qx is consistent, as per Definition 2.3 (with x c = x). By a well-known result from 
linear statistic estimation EDI , the following holds: 

Lemma 3.5. Matrix R computed as in ((7]) minimizes a T Var(y) (where y = Rz). Moreover, y is consistent 
and unbiased, i.e., E[y] = Qx. 

Observation 1. If S is an orthonormal basis (as with wavelets, Fourier and identity strategies), we have 
S T = S"i. This implies G = S' 1 = S T , so R = QS T . 

The cost of finding R as above is relatively high, due to the need to perform matrix inversion. While the 
diagonal matrix E is trivial to invert, since E^ 1 = (Ejj) -1 , the matrix S^E -1 .? is generally dense, so is 
more costly to invert. 
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3.3 Fast Consistency 



The vector y = Rz = R{Sx+v) computed via the optimal recovery matrix R in Step 3 (Section 3.2 1 has two 
important properties: (i) y is consistent (Definition 2.3); and (ii) Var(y;) is minimized for each 1 < i < q. 
Since EfyJ = Qi.x we have Var(?/j) = E[(y; — Qi.x) 2 ]. Thus, y achieves minimum error for each query 
in expectation. As observed in H] for practical applications it may be necessary to return a vector y 1 
which is consistent and minimizes a different error measure, e.g., we may wish to minimize ||y* — 
For example, p = 1 implies that y 1 minimizes the average error and p = oo minimizes maximum error. 

In this section we show how to efficiently compute another recovery matrix R l such that y 1 = R l z is 
consistent and \\y l — Qx\\ p is small. This approach is particularly useful when the query matrix Q G M 9xAr 
has q <C N. As we show below, we significantly improve the running times of the approaches used in fl] IU 
for this case. The approach in flU [6j, translated in our strategy/recovery framework, is described below. 

Start by defining a recovery matrix R° such that Q = R°S and y° = R°(Sx + v) has bounded error 
||y° — Qx\\ p < t. Usually, R° is the recovery matrix from Step 2 of our framework. For example, the matrix 
R° in [6] is implied by the clustering function over marginals, which heuristically minimizes some L p -error 
of the noisy answers. Next, compute a consistent answer y 1 that minimizes Wy 1 — y°\\ p . Recall that y 1 is 
consistent if there exists x c such that y 1 = Qx c . Hence, for p = 1 or p = oo, y 1 can be computed via a 
linear program (LP) with variables corresponding to the entries of x c , and consistency conditions expressed 
as linear constraints. Other requirements can also be imposed on x c , e.g., integrality or non-negativity. For 
p = 2, y 1 is the solution to a least squares problem (LS). 

However, such an LP, resp. LS, uses at least N variables corresponding to the entries in x c . When N is 
large (as is usually the case), this leads to large linear programs. This issue was reported as a bottleneck in 
the experimental evaluation of We now propose a different LP, resp. LS, formulation for the consistency 
problem, which requires at most q variables (recall that q is the number of queries in the workload Q). This 
leads to large improvements in running time when q <^ N. 

First, note that rank(Q) = q implies that any answer y G W 1 is consistent. This is because the linear 
system Qx c = y admits the solution x c = Q T (QQ T )~ 1 y (rank(Q) = q implies that QQ T is invertible). In 
particular, y 1 = y° is consistent and minimizes \\y — y°\\ p for any p. 

Assume that rank(Q) = q' < q. We pick q' linearly independent rows of Q, denoted as Q' G W' xN , and 
use them to decompose Q as Q = CQ' for some matrix C G M. qxq . Because Q' has linearly independent 
rows, the above argument implies that, for any y6l ? , the linear system Q'x c = y has a solution. Hence, 
any answer y is consistent for the queries Q'. Then y 1 = Cy is consistent for all queries Q: y 1 = Cy = 
CQ'x c = Qx c . We find y that minimizes ||Cy — y || p and return Cy: For p = 1 andp = oo, y is the solution 
to an LP; for p = 2, y is the solution to a least squares problem. In all cases, the number of variables is 
q' < q <C N. As observed in [ 1], the utility guarantee follows by the triangle inequality. If \\Qx — y°\\ p < t, 
then 

Hy 1 - y°\\ p = min \\Cy - y\ < \\CQ'x - y% < t. 



Thus, the additional L p -error introduced by consistency is at most the L p -error of the original noisy answer, 
i.e., the error at most doubles. 

When Q is a set of marginals, we can formulate the LP, resp. LS, without explicitly computing rank(Q) 
or finding a collection of linearly independent rows Q'. Rather, we use the Fourier coefficients of the 



marginals. The discussion is deferred to Section 4.3 



4 Consistent Marginals via Fourier Strategies 

In this section, we focus on the case when all queries Q correspond to marginals. Here, we show that the 
choice of S as an appropriate Fourier matrix gives strong guarantees on the variance, as well as providing 
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consistent query answers. 



4.1 Marginals and Fourier analysis 

In this section we assume that all d attributes in the database table are binary; for simplicity, let the domain of 
each attribute be {0, 1}. We emphasize that this assumption is without loss of generality: an attribute which 
has \ A\ distinct values can be mapped to [log \ A\~\ binary attributes (and we do so in our experimental study). 
However, we present our results with binary attributes to avoid overcomplicating the notation. Consequently, 
there are N = 2 d entries in the database vector x, where each entry is indexed by some a G {0, l} d , and x a 
is the number of entries in the database with attributes a; recall the example in Figure [TJ a). 

There are also 2 d possible marginals (a.k.a. subcubes of the data cube) of interest, corresponding to 
aggregations along a subset of dimensions. For any a £ {0, l} d , let C a denote the marginal over non- 
zero attributes in a, and let ||a|| denote the number of non-zero entries in a, i.e., the dimensionality of the 
marginal. Note that here a is the bit- vector indicator for the attributes in the marginal. We will consistently 
use it as a superscript in such cases, and as a subscript when it indexes an entry in a vector. 

We use the following notations, as in [[TJ: For any pair of a, /3 E {0, l} d we write a A [3 for the bit- 
wise intersection of the pair, i.e. (a A f3)i = on A The inner-product in this space, (a, /3), can also be 
expressed via the intersection operator: (a, ft) = \\a A/3||. We say that a is dominated by (3, denoted a < f3, 
if a A P = a. 

The computation of a marginal C a over the input can be thought of as a linear operator C a : M. 2 — > 
IR 2 """ mapping the full-dimensional contingency table to the marginal over non-zero attributes in a, by 
adding relevant entries over the attributes not in a. More precisely, for each ft < a, the cell [3 in the 
marginal C a , denoted (C a x)p, sums the entries in the contingency table x whose attributes in a are set to 
values specified by /3: (C Q x) /3 = E r 7Aa =/3 x i- 

Example. Let x be the vector in Figure [TJ a). Assume we want to compute the marginal C a = C 110 , i.e., 
the marginal over attributes A and B. Then the value in the cell (A = 0, B = 0) is denoted by (C 110 x)ooo 
(i.e., P = 000). The value in the cell {A = 0, B = 1) is denoted by (C 110 x) io- Note that 000 < 110 and 
010 ■< 110. On the other hand, 001 ^ 110, so there is no cell (C 110 x) oi in the marginal over A, B. So, 
while the cell index /3 is d-dimensional, only the ||a|| bits corresponding to non-zeros in a vary — the rest 
are held at 0. Hence, there are only 2 " " cell indexes in the marginal C a x. In this example, there are only 
4 cells in C lw x. By the above formula, (C 110 x)ooo = ^ooo + ^ooi = 3 and (C 110 :c)oio = ^oio + xon = 
1. □ 
The set of all marginals C a with ||a|| = A; is referred to as the set of all /c-way marginals. They are 
commonly used to visualize the low-rank dependencies between attributes, to build efficient classifiers from 
the data, and so on. 

We use the Hadamard transform, which is the 2 d -dimensional discrete Fourier transform over the Boolean 
hypercube {0, l} d . This allows us to represent any marginal as a summation over relevant Fourier coeffi- 
cients. The advantage is that the number of coefficients needed for each marginal is just the number of entries 
in the marginal. The Fourier basis vectors f a for a £ {0, l} d have components = 2~ d l 2 {— 1)^'^. The 

vectors f a form an orthonormal basis in M. 2d . We will use the following properties of Fourier basis vectors 
and marginal operators in the Fourier basis (proofs can be found in [ 1 ]): 

Theorem 4. 1. For all a, f3 G {0, l} d we have: 

1. (c a f) 7 = E fS= E (-i)^/2 d/2 . 

2. C a x= E (f,x)C Q f 
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Table 1: Releasing all k- way marginals for k < dj 2. Expected noise per marginal: E ||C^x — C^\\\ .The 
total number of released marginals is {T\ and the total number of Fourier coefficients required to compute 



these marginals is Yli= 



\k; 



< 



4.2 Bounds for marginals 

The use of a Fourier strategy matrix was studied in HI, under a uniform error budget. Here, we show that 
using a non-uniform budgeting can provide asymptotically improved results. We study the case when the 
query set Q corresponds to a collection of I marginals C Ql , . . . , C ai . For a given marginal C ai the accuracy 
bounds will be parametrized by its dimensionality 1 1 a i \ \ , the total number of marginals £ and the total number 



of Fourier coefficients corresponding to the collection of marginals, denoted as | F\ . Theorem 4. 1 2) implies 
that \F\ = | Uj {/? : f3 ■< cti}\. If the random variable corresponding to the differentially private value of a 

marginal C a x is denoted as C a x, then we state a bound on the expected absolute error, E ||C Q :r — C^xHi 
to simplify presentation and comparison with prior work. All our bounds can also be stated in terms of the 
variance Var(C ,Q x), or as high-probability bounds. 

The asymptotic bounds on error are easier to interpret in the important special case of the set of all A:-way 
marginals. In this case because of the symmetry of the query workload, the expected error in all marginals 
is the same. Table [T] summarizes bounds on error in this case together with the unconditional lower bounds 
for all differentially private algorithms from [ 15]. While in the case of (e, 5) -differential privacy our upper 
bounds are almost tight with the lower bounds from ifTBI . for e-differential privacy the gap is still quite 
significant and remains a challenging open problem. 

Our next lemma (proof in Appendix [C]) gives bounds on expected error of the Fourier strategy with 
non-uniform noise. 

Lemma 4.2. For a query workload consisting of all fc-way marginals over data x 6 M. 2d the bounds on the 
expected error of the Fourier strategy mechanism with non-uniform noise are given as follows: 

1. For e-differential privacy the expected noise per marginal is 0( i • k\J (fy 

2. For (e, 5) -differential privacy the expected noise per marginal is • J k log(l/ S)) . 

These bounds are summarized in Table [T] along with those that follow from other approaches.We note 
in passing that we can provide a tighter analysis of the noise for the Fourier strategy under uniform noise 
than in HI, by a factor of 0(V2^) — details are in Appendix |b| 



Time cost comparison. To directly compute a single fc-way marginal over d-dimensional data takes time 
0{2 d ), and so computing all /c-way marginals takes 0(d k 2 d ) naively. Computing the Fourier transform of 
the data takes time 0(d2 d ), and deriving the fc-way marginals from this takes time 0(4 fc ) per marginal, i.e., 
0(d2 d + A k d k )fora\l marginals [1J. We compare the cost of different strategies to these costs. Materializing 
noisy counts (S = I) and aggregating them to obtain the fc-way marginals also takes time 0(d k 2 d ), as does 
materializing the marginals and then adding noise (S = Q). 
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The clustering method proposed in [6] is more expensive, due to a search over the space of possible 
marginals to output. The cost is 0(d k kmm(2 d d k ,3 d )): clearly as the dimensionality grows, this rapidly 
becomes infeasible. However, across all strategies, the step of choosing the non-uniform error budget is 
dominated by the other costs, and so does not alter that asymptotic cost. 

4.3 Consistency via Fourier coefficients 

In Section[33]we discussed a general approach for computing consistent answers for a query workload Q 6 
H qxN with rank(Q) < q. The approach required explicitly finding a set of rank(Q) linearly independent 
rows in Q and decomposing Q. We now show that when Q is a set of marginals we can compute consistent 
answers without such expensive steps. Instead, we ensure consistency by writing an LP that uses the Fourier 
coefficients corresponding to the marginals in Q. Let Q consist of £ marginals C ai , . . . , C ai . We introduce 
variables for the Fourier coefficients corresponding to these marginals, denoted as F = {fP\z\i: f3 X aj}. 
To simplify notation, we rename them as F = {fx, . . . , f m }, where \F\ = m. Marginals C ai , . . . , C° e can 
be computed from F, using formulas from Theorem |4.1| 

for all i < £ and 7 < a. L . We will index entries in the marginals by pairs (i, 7), where 7 ^ a%. Let the 
total number of entries in marginals C ai , . . . , C ak be equal to Yli=i 2"°'" = K. Let R be the recovery 
matrix for the Fourier strategy: R € R Kxm with entries % 7 ), a = {C a *f a )j. Then (C Ql , . . . , C ae ) = R ■ 
• • • ;/m)- Suppose that we are given a set of inconsistent noisy values of these marginals (C ai , ■ ■ ■ ,C ae ). 
We formulate the following optimization problem to find the consistent set of marginals (C ai , . . . , C ae ) that 
is closest to the noisy values in L p -norm: 

Minimize \\(C ai , . . . ,C° k ) - {C a \ . . . , C ae )\\ p 
Subject to (C ai , . . . , C^) = R-(h,...,f m ) 

For p = 2 this is gives a least squares problem, with m variables and K constraints, which is expressed as: 

Minimize \\R-(h,..., f m ) ~ (C ai C a *)\\ 2 . 

For p = 1 and p = 00, this gives an LP similar to fl]. 

The running time of this consistency step via least squares only depends on the number of queries. For 
example, for the case of all fc-way marginals, we need to work with matrices of size 0(d k ), and perform 
a constant number of multiplications and inversions. In contrast, prior work required solving LPs of size 
proportional to the size of the data, N = 2 d , which takes time polynomial in N. 

5 Experimental study 

Datasets. We studied performance on two real datasets: 

Adult: The Adult dataset from the UCI Machine Learning repository ( |http : / /archive . ics . uci . | 
|edu/ml7] l has census information on 32561 individuals. As in [6], we extract a subset of sensitive categor- 
ical attributes, for workclass (cardinality 9), education (16), marital-status (7), occupation (15), relationship 
(6), race (5), sex (2) and salary (2). 

NLTCS: The National Long-Term Case Study from StatLib ( [http : //lib . stat . emu . edu/[ ), contains 
information about 21576 individuals. Each record consists 16 binary attributes, which correspond to func- 
tional disability measures: 6 activities of daily living and 10 instrumental activities of daily living. 
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Figure 4: Accuracy of marginal release on ADULT data 

Query workloads. The choice of query workloads in our experimental study is motivated by an application 
of low-order marginals to statistical model fitting. In this setting, the typical set of queries consists of all 
k-way marginals (for some small value of k) together with some subset of (k + l)-way marginals, chosen 
depending on the application. We consider three different approaches: 

1. Qk- all the fe-way marginal tables. 

2. Qt: all the /c-way marginal tables, plus half of all (k + l)-way marginals. 

3. QI'. all the A>way marginal tables, plus all (k + l)-way marginals that include a fixed attribute. 

Evaluation metrics. We measure the average absolute error per entry in the set of marginal queries. To 
show the utility of these results, we scale each error by the mean true answer of its respective marginal 
query, i.e., we plot it as a relative error. Thus, a relative error of less than 1 is desirable, as otherwise the 
true answers are dwarfed by the noise (on average). Note that while the number of tuples in each dataset 
is relatively small, our approaches do not depend on the tuple count, but rather the dimensionality of the 
domain N = Tlf =l \Ai\. Larger datasets would only improve the quality metrics, while keeping the running 
time essentially unchanged. 

Algorithms Used. We present results for e-differential privacy. Results for (e, 5) -differential privacy are 
similar, and are omitted. We include seven approaches within the strategy/recovery framework, based on 
choice of the strategy matrix, S. Here, the notation S + indicates that we use the non-uniform noise allocation 



for strategy S as described in Section 3.1 while the corresponding S is with uniform noise. 



S = I — Add noise via Laplace mechanism directly to base cells and aggregate up to compute the 
marginals. Here, the optimal noise allocation is always uniform. 



S = Q — Add uniform (Q) or non-uniform {Q + ) noise to each marginal independently. 
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Figure 5: Accuracy of marginal release on NLTCS data 



• S = F — Add uniform (F) or non-uniform (F + ) noise to each Fourier coefficient, corresponding to 
the given query workload. 

• S = C — Add uniform (C) or non-uniform (C + ) noise to each marginal returned by the greedy 
clustering strategy proposed in 0. 

Our goal is to study the effects of non-uniform noise budgeting over all strategies. The decision on 
which strategy to use rests with the data owner. However, we show clear tradeoffs between running time 
and accuracy for all strategies, which can provide helpful hints. 

To ensure consistency of the released marginals, we use the Fourier analytic approach, described in 
Section 1431 



5.1 Adult Dataset 

Figure [4] shows the results on the Adult data set, for query workloads Qi,Q*,Qi,Q2i Q\ an d Q%. The 
attributes in this data set have varying cardinalities, but are encoded as binary attributes as described in 



Section 4.1 We plot the results on a logarithmic scale as we vary the privacy parameter e, to more clearly 
show the relative performance of the different measures. Immediately, we can make several observations 
about the relative performance of the different methods. On this data, the naive method of materializing 
counts (I) is never effective: the noise added is comparable to the magnitude of the data in all cases. Across 
the different query workloads, choosing the strategy S = Q works generally well. In this case non-uniform 



noise allocation can significantly improve the accuracy. For example, over workload Q* (Figure 4(b) ), we 
see an improvement of 20-25% in accuracy. 



For more complex queries which result in more marginals of higher degree (Q% and Q%, in Figures 4(e) 



and |4(f)| respectively), the accuracy is lower overall, and the noise is greater than the magnitude of the data 
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Figure 6: Running time over NLTCS data 
for more restrictive settings of the privacy parameter e. To some extent this can be mitigated as the number 
of individuals represented in the data increases: the noise stays constant as the value of the counts in the 
table grows. 

Across this data, we observe that while the non-uniform approach improves the accuracy of the Fourier 
strategy, it is inferior to other strategies. Although asymptotically this strategy has good properties (as 
described in Table [I]), in this case k is not very large, so the gap between the k and 2 k terms for constant k 
is absorbed within the big-Oh notation. The running times of our methods were all fast: the Fourier (F) and 
Query (Q) methods took at most tens of seconds to complete, while the clustering (C) took longer, due to 
the more expensive clustering step. 



5.2 NLTCS data 

Figure [5] shows the corresponding results on the binary NLTCS data. Over all experiments, there is an 
appreciable benefit to applying the optimal non-uniform budgeting. The optimal budgeting case is reliably 
better than the uniform version, for the same strategy matrix. There are occasional inversions, due to the 
random nature of the mechanisms used, but the trend is clear. The advantage can be notable: for example, 



on Q\ (Figure 5(b) I and Q\ (Figure 5(e) I, the error of the Fourier strategy is reduced 30-35% by using non- 
uniform budgeting. For the clustering approach, the improvement is smaller, but still measurable, around 
5% on average. However, recall that strategy C becomes infeasible on higher dimensional data, due to its 
exponential cost. 

Figure [6] shows the end-to-end running time of the different methods. This demonstrates clearly the 
dramatically slow running time of the clustering method: reaching several hours to operate on a single, 
moderate-sized dataset. As the dimensionality increases, this becomes exponentially worse. By contrast, 
the time needed by the other strategies is negligible: always less than a second, and typically less than a 
tenth of a second. The optimization and consistency steps take essentially no time at all, compared to the 
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data handling and processing. On the other hand, all methods have virtually constant time as a function of 
the number of tuples in the data. 

Returning to the accuracy, over the k = 1-way marginals and variations, the approach of materializing 
the base counts (I) is not competitive, while the clustering strategy (C) achieves the least error. The more 
lightweight Fourier-based approach achieves slightly more error, but is much more scalable. As the degree 
of the marginals increases (Q\, Figure 5(e) and Q%, Figure 5(f) I, the trivial solution of materializing the 
base cells becomes more accurate. For workloads that are made up of high-degree marginals, this method 
dominates the other approaches, although such workloads are considered less realistic. 



6 Concluding Remarks 

We considered the problem of releasing data based on linear queries, which captures the common case 
of data cubes and marginals. We showed how existing matrix-based strategies can be improved by using 
non-uniform noise based on the query workload. Our results show that such non-uniform noise results in 
significantly lower error across all cases considered. Further, the cost of this is low, and the results can be 
made consistent with minimal extra effort. 

Other notions of consistency are possible within this framework. For example, it is sometimes required 
that the query answers correspond to a data set in which all counts are integral and non-negative. This can 
be achieved when the method actually materializes a noisy set of base counts x (as in the case of strategy /) 
by adding the constraints that Xj > and rounding the results to the nearest integer. It remains to show how 
to enforce such consistency constraints efficiently when base counts are not explicitly materialized. 

On the theoretical side, we have shown bounds on accurate A;-way release under differential privacy of 

0(~J d) ( d ~^ k ))- An open problem is to close the gap between this and the lower bound of &(l\f(fy)- 

Acknowledgments. We thank Adam D. Smith and Moritz Hardt for multiple useful comments and sugges- 
tions. 
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A Budgeting for (e, 5) -differential privacy 

Proof of Proposition^. l\ part (ii). Consider a matrix S' G ]^ mxAr with entries 5 4 ' = where a 



adding to values 



max^j yYlT^i e2 ^ij- Because L2-norm of any column of S' is at most 1 by Theorem 2.2 
S[x Gaussian noise with variance 2 lo s(V s ) guarantees (e, 5) -differential privacy. Given these noisy values 
which we denote as S[, the noisy values S{ can be computed as Sj = — and so Si has variance 2l °g(ys)a , 
which is at most 2 iggpj/g) for e > a as desired. □ 



Proof of Corollary \3. 3\ for (e, 5)-differential privacy. For (e, 5) -differential privacy the analog of optimiza- 
tion problem Q - ([6]) is: 



m 

S 



i=l £ 



Minimize: 21og(2/<5) ^ 
m 



2 e 2 = e 2 



i=l 

We can ignore the multiplicative factor in the objective function, because it doesn't change the optimum 
solution. Then the corresponding Lagrange function is: 



Using condition ^ = Owe have: 



so e 2 = and we have 



u=i e * / \i=l 



m 

I s i _ 2 
■>%\ i ~ — £ 



1=1 

which gives yf\ = \ YlY^i CiJ~sl and ef = ^- v nt v? r - . The value of the objective function is now 
given as: 

If all values Q are equal to C this gives 21 °g(^)g 2 (£™ i ^ 2 _ □ 



B Fourier strategy-Uniform Noise 

In this section, we provide a tighter analysis of the expected noise that results from using the Fourier strat- 
egy with uniform noise. This is to be compared with the bound of 0(2lHI|,B| log(\B\/5)/e) stated in 0~1 
Theorem 7]. 

Theorem B.l. Let the query workload Q consist of marginals C ai , . . . , C Qfc and B be the set of Fourier 
coefficients, corresponding to this workload, such that B = {(3\3C ai : (3 ^ a>i}. Then if we release all 
Fourier coefficients in B via the Laplace mechanism with uniform noise and use them to compute private 
values of the marginals C ai , . . . , C ak for each marginal C ai bounds on the noise per marginal can be given 
as: 
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2. \\C a *x - C<* ||i = O ^ \ B \V^ a ^s( \m/3^ f with probability at least 1 



Proof. Let a = on and w.l.o.g, assume that x = 0, so: 

||C a x- C a h = ||C Q ||i = 



EE^ E (-1)^)2-^ 

7^a /3^o 



(8) 



The last step follows due to the symmetry of the Laplace random variables (f>p. Note that for iid unbiased 
random variables Y, E[\Y\] = E[W^} < ^E[Y 2 } = V / Var(y), using Jensen's inequality. Hence, using 
the linearity of expectation and the fact that Var(<^) = -j^r, we have: 



\C a x-C a \\i 



nd/2 



< 2 d/2 ^2\\ a W Var(^) 



V2 3 +IHI|5 



(9) 



To get a concentration bound for the second part of the Theorem, we use the fact that with probability at 
least 1 - 6, I E/3 M = 0(^/E^Var(^)logl/<J) (see, e.g. @). 
Substituting this in ([8]), we have with probability 1 — 5, 

\\C a 'x - (5 a *||i = O (^2^1^ log 1 / 2 

Rescaling 5 by a factor of |B| means that by a union bound this holds for all a«. □ 



C Omitted Proofs 



Proof of Lemma 42_ We have q = 2 k (fy, m = Ya=o it) an ^ N = 2 d . For the set of all /c-way marginals 
the matrices Q G M 9x7V , 5 6 M mx7V and i? e M 9Xm have the following entries: 



<3(i,t)i 



1, if i A j = t 
0, otherwise, 



'(_l)(M>2«*/2-fc ) if j -< j 
0, otherwise. 



5/' 



/2 d/2 j 
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where we abuse notation indexing entries by vectors i, j, t S {0, l} d only if entries corresponding to respec- 
tive subsets exist. Here, t indexes marginals. Formally, we use all pairs (i, t), where ||z'|| = k and t < i 
to index over [q\. We use an index i, where ||i|| < k to index over [m] and a regular index < i < 2 d to 
index over [N]. The grouping number of S is equal to m because the groups consist of individual rows of S, 
so for all 1 < i < m we have Q = \j2 d l 2 and h = 2 Ylj=i a j^ji- Using a = l q and substituting entries 
of R, we have h = 2 d ~ k+1 ■ (£JJ|), where we use index i as described above. Thus, 

* /a\ /a _ A V3 



i=l i=0 v 7 v 7 



Now, using Corrollary 3.3 privacy (and assuming k < d/2) we get the sum of the noise variances over 
all entries equal to 

1 (d\ fd-i\ 1/3 \ 3 < 3(k + l) 2 ^ (d\ 3 (d-i 



E 



v i=0 7 7 i=Q 

3(A; + l) 2 fd\ 2 fd\ (k 



2 k-l £ 2 \k 
i=0 



3(fc + i) 2 (f) f (d\ 2 /a 3( fe + i)^(f) 2 * f d \ fk 

2 k-i £ 2 2^\i \i - 2 k ~ 1 e 2 ^ 



i=0 v 7 v 7 i=0 



I I \ l 



3(fe + i) 2 (T)(.) 

2 fc-l £ 2 

Thus, dividing by q, the variance of noise per entry of a marginal table is O ( — 2 fc 2 fc £ 2 fc ) ■ By Jensen's 



inequality, the expected magnitude of noise per entry is O 



VgK 



ft*,,, M fc 



2 fc e 



For (e, 5) -differential privacy Corollary 3.3 gives variance at most 



'log(l/<5) , (d\ 2 (d-i 

k — i 



<o(™. S (* + DE 

v i=0 



/ fclog(l/<5) fd\ f d\ fk 

i=0 



i=0 v 7 



2% 2 



which gives 0{^Jk{ dJ ^ k ) log(l/5)) expected noise per marginal entry. □ 
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